17 research outputs found

    Source separation with one ear : proposition for an anthropomorphic approach

    Get PDF
    Abstract : We present an example of an anthropomorphic approach, in which auditory-based cues are combined with temporal correlation to implement a source separation system. The auditory features are based on spectral amplitudemodulation and energy information obtained through 256 cochlear filters. Segmentation and binding of auditory objects are performed with a two-layered spiking neural network. The first layer performs the segmentation of the auditory images into objects, while the second layer binds the auditory objects belonging to the same source. The binding is further used to generate a mask (binary gain) to suppress the undesired sources fromthe original signal. Results are presented for a double-voiced (2 speakers) speech segment and for sentences corrupted with different noise sources. Comparative results are also given using PESQ (perceptual evaluation of speech quality) scores. The spiking neural network is fully adaptive and unsupervised

    New Trends in Biologically-Inspired Audio Coding

    Get PDF
    This book chapter deals with the generation of auditory-inspired spectro-temporal features aimed at audio coding. To do so, we first generate sparse audio representations we call spikegrams, using projections on gammatone or gammachirp kernels that generate neural spikes. Unlike Fourier-based representations, these representations are powerful at identifying auditory events, such as onsets, offsets, transients and harmonic structures. We show that the introduction of adaptiveness in the selection of gammachirp kernels enhances the compression rate compared to the case where the kernels are non-adaptive. We also integrate a masking model that helps reduce bitrate without loss of perceptible audio quality. We then quantize coding values using the genetic algorithm that is more optimal than uniform quantization for this framework. We finally propose a method to extract frequent auditory objects (patterns) in the aforementioned sparse representations. The extracted frequency-domain patterns (auditory objects) help us address spikes (auditory events) collectively rather than individually. When audio compression is needed, the different patterns are stored in a small codebook that can be used to efficiently encode audio materials in a lossless way. The approach is applied to different audio signals and results are discussed and compared. This work is a first step towards the design of a high-quality auditory-inspired \"object-based\" audio coder

    A parallel supercomputer implementation of a biological inspired neural network and its use for pattern recognition

    Get PDF
    Abstract : A parallel implementation of a large spiking neural network is proposed and evaluated. The neural network implements the binding by synchrony process using the Oscillatory Dynamic Link Matcher (ODLM). Scalability, speed and performance are compared for 2 implementations: Message Passing Interface (MPI) and Compute Unified Device Architecture (CUDA) running on clusters of multicore supercomputers and NVIDIA graphical processing units respectively. A global spiking list that represents at each instant the state of the neural network is described. This list indexes each neuron that fires during the current simulation time so that the influence of their spikes are simultaneously processed on all computing units. Our implementation shows a good scalability for very large networks. A complex and large spiking neural network has been implemented in parallel with success, thus paving the road towards real-life applications based on networks of spiking neurons. MPI offers a better scalability than CUDA, while the CUDA implementation on a GeForce GTX 285 gives the best cost to performance ratio. When running the neural network on the GTX 285, the processing speed is comparable to the MPI implementation on RQCHP’s Mammouth parallel with 64 notes (128 cores)

    Nonlinear speech processing with oscillatory neural networks for speaker segregation

    No full text
    Nonlinear masking of space-time representations of speech is a universal technique for speech processing. In the present work we use an AM representation of cochlear filterbank signals in combination with a mask that is derived from a network of oscillatory neurons. The proposed approach does not need any training or learning and the mask takes into account the dependence between points from the auditory derived representation. A potential application is illustrated in the context of speaker segregation.

    Double-vowel Segregation Through Temporal Correlation: A Bio-inspired Neural Network Paradigm

    No full text
    A two-layer spiking neural network is used to segregate double vowels. The first layer is a partially connected spiking neurons of relaxation oscillatory type, while the second layer consists of fully connected relaxation oscillators. A twodimensional auditory image generated by the enhanced spectrum of cochlear filter bank envelopes is computed. The segregation is based on a channel selection strategy. At each instant of time each channel is assigned to one of the sources present in the auditory scene, i.e. speakers. No prior estimation of pitch for the underlying sources is necessary
    corecore